## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
I will explore every variable
this explain that there is a normal distrpution and central skwed
The distribution of Fixed Acidity is positively skewed. The median is around 8 with high concentration of wines with Fixed Acidity
The distribution of Volatile acidity looks like Bimodal with two peaks around 0.4 and 0.6.
the distribution of Citric acid looks strange. Some higher values have no data at all and apart from them, the distribution looks almost rectangular. Maybe there was some error in the data or maybe the data collected was incomplete
A high concentration of wines around 2.2 (the median) with some outliers along the higher ranges.
For Chlorides also, we see a similar distribution like Residual Sugar. We have got rid of extreme outliers in this image.
distribution with very few wines over 60.
As expected, this distribution resembles closely the last one.
The distribution for density has a very normal appearence.
pH also looks normally distributed.
For sulphates we see a distribution similar to the ones of residual.sugar and chlorides.
there is a long tailed distribution in sulfur.dioxide
There are 1599 observation of wines in the dataset with 12 features . There is one categorical variable (quality) and the others are numerical variables that indicate wine physical and chemical properties of the wine.
Other observations: The median quality is 6, which in the given scale (1-10) is a mediocre wine. The better wine in the sample has a score of 8, and the worst has a score of 3.
quality of wines. ### What other features in the dataset do you think will help support your
investigation into your feature(s) of interest? The variables related to acidity (fixed, volatile, citric.acid and pH) might explain some of the variance. I suspect the different acid concentrations might alter the taste of the wine. Also, residual.sugar dictates how sweet a wine is and might also have an influence in taste. ### Did you create any new variables from existing variables in the dataset? no i didn’t create any new variables ### Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?
Citric.acid stood out from the other distributions. It had (apart from some outliers) an retangularly looking distribution which given the wine quality distribution seems very unexpected.
As we can see, Fixed Acidity has almost no effect on the Quality. The mean and median values of fixed acidity remains almost unchanged with increase in quality.
Volatile acid seems to have a negative impact on the quality of the wine. As volatile acid level goes up, the quality of the wine degrades.
Citric acid seems to have a positive correlation with Wine Quality. Better wines have higher Citric Acid.
that chart explained that the residual sugar has no effect on the quality f the wine
from the previous chart we found that lower percent of Chloride produce better wines.
We see here that too low concentration of Free Sulphur Dioxide produces poor wine and too high concentration results in average wine.
As this is a Subset of Free Sulphur Dioxide, we see a similar pattern here.
it seems that the lower of density produces more quality wine
Better wines seems to have less pH but ther is no big effect on the quality.
Even though we see many outliers in the ‘Average’ quality wine, it seems that better wines have a stronger concentration of Sulphates.
The correlation is really distinct here. It is pretty evident that better wines have higher Alcohol content in it. But we see a great number of outliers here. So it might be possible that alcohol alone does not contribute to a wine being a good quality one. Let’s make a simple linear model and try to get the statistics here.
##
## Call:
## lm(formula = as.numeric(quality) ~ alcohol, data = red)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8442 -0.4112 -0.1690 0.5166 2.5888
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.87497 0.17471 10.73 <2e-16 ***
## alcohol 0.36084 0.01668 21.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7104 on 1597 degrees of freedom
## Multiple R-squared: 0.2267, Adjusted R-squared: 0.2263
## F-statistic: 468.3 on 1 and 1597 DF, p-value: < 2.2e-16
Based on the value of R squared, we see that Alcohol alone contributes to only about 22% of the Wine quality. So there must be other variables at play here. I have to figure them out in order to build a better regression model.
So now I will put a correlation test against each variable to the quality of the wine.
## fixed.acidity volatile.acidity citric.acid
## 0.12405165 -0.39055778 0.22637251
## log10.residual.sugar log10.chlordies free.sulfur.dioxide
## 0.02353331 -0.17613996 -0.05065606
## total.sulfur.dioxide density pH
## -0.18510029 -0.17491923 -0.05773139
## log10.sulphates alcohol
## 0.30864193 0.47616632
From the correlation test, it seems that the following variables have a higher correlation to Wine Quality.
ther is a negative realation between the volatil acid and the quality lower densities produces good wine ### Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)? the realtionship between the free and total sulfur dioxide almost it seems the totaly connected ### What was the strongest relationship you found? the realtion between the tolal sulfur and the free sulfur
lower in volatile acid and higher in alchol produce good quality wine
We can see higher quality wine have higher alcohol and higher citric acid .
it seems that more in both of the alchol and sulphates produces more qaulity wine
low volatile acid and high sulphates produces a good wine
low in volatile acid ang high in citric acid produces good qaulity of wine
# Multivariate Analysis
the sugar has no effect on the qaulity of the wine ### OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.
This chart revealed how a high in alcohol and lower in volatile.acidity has a big influence on the quality of wines. that is because the alcohol has a postive correlation with the quality but the volatile acidity has a negative correlation with the quality
every examination we have done explained that high alcohol and high sulphate concentrations combined seem to produce better wines. because the alcohol has a postive correlation with the quality and the same for the sulphates
every examination we have done explained that high in sulphates and higher in citric acid produce a much more high quality wine
the biggest challenge that i faced when i started to analys this data base is there is a many variables may be is resbonbile or related to qaulity for wine and i have to determine and predict which the variables is basicaly making affect on the qaulity
so i started to explain every variable alone and see the general shape of the distripution and note if is there any thing is abnormal as i expected i found many variable has no effect on the wine qaulity such as the resudual sugar
so i made a linear correlation to the quality and i found the more 4 factors affected on the quality three of them has a positive correlation :- 1. Alcohol 2. Sulphates 3. Citric Acid and only one has a negative correlation :-
so i have started to make a analysis for thos factors together
In the final part of my analysis, I plotted multivariate plots to see if there were some interesting combinations of variables which together affected the overall quality of the wine so i found that the alcohol has a big affect on the quality and the citric acid
For future analysis, I would love to have a dataset, where apart from the wine quality, a rank is given for that particular wine by 5 different wine tasters as we know when we include the human element, our opinion changes on so many different factors. So by including the human element in my analysis, I would be able to put in that perspective and see a lot of unseen factors which might result in a better or worse wine quality. Having these factors included inside the dataset would result in a different insight altogether in my analysis.